Goto

Collaborating Authors

 random forest and feature importance


The Math of Decision Trees, Random Forest and Feature Importance in Scikit-learn and Spark

#artificialintelligence

This post attempts to consolidate information on tree algorithms and their implementations in Scikit-learn and Spark. In particular, it was written to provide clarification on how feature importance is calculated.


The Mathematics of Decision Trees, Random Forest and Feature Importance in Scikit-learn and Spark

#artificialintelligence

This post attempts to consolidate information on tree algorithms and their implementations in Scikit-learn and Spark. In particular, it was written to provide clarification on how feature importance is calculated. There are many great resources online discussing how decision trees and random forests are created and this post is not intended to be that. Although it includes short definitions for context, it assumes the reader has a grasp on these concepts and wishes to know how the algorithms are implemented in Scikit-learn and Spark. Decision trees learn how to best split the dataset into smaller and smaller subsets to predict the target value.